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* Fiye ccoparisons were aade* relative to the quality of 

estimates of ability parameters and, item caiitratioiis obtained from 
the one-parameter and thr^e-parameter logistic model)?. The results 
indicate; (1) The three-paraimeter* model fit th^e test data better in 
all cases than did the one-parameter model* For simulation data sets, 
multi-factor data were less well fit than single-factor data* (2) The 
one-parameter model ability estimates shared more variance with the 
item responses than did the three-parameter model* (3) There was no 
difference in the concurrent ^ralidity for small samples between the, 
two models in predicting classroom achievement t,ests* (4) The 
three- parameter model requited larger samples for calibration than 
did 'the one-parameter iodel* (5) The ability estima^tes from the two" 
models correlated highly for most of the data sets* The one- parameter 
model is preferred for use with small sample data; but the goodness 
of fxt datsi reflected a different point of view when accurate 
estimation of item parameters is important* The three-parameter model 
fit all data sets better than the one-parameter model* Data -sets from 
the Missouri School and College Ability iests, and from undergraduate 
course final examinations were used to illloistrate the models* 
. (Author/CTM) 
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Staoe the development of the three-paraBetei: logistic model by 



Blmbaum (1958),- and the independent production of a simpler, one-parameter 
^ ' -logistic model by Rasoh (I960,, there has been an ongoi:!, debate concerning 
^ tie relative merits of the t«o-«odels. -The debate stems from the need to 
S '^e the very restrictive assun^tions of equal discrimination and no 

guessing for test it^ms .sing the one^aramater model, .*ile the three-. 
^ parameter ^del retires ou^ers'ome estimation procedures for calibration. 
The purpose irthe research presented here is to evaluate the relative 
«rlts of the two models for item calibration and ability estimation, , 
resulting in a clarification of the above issue. . • 

wo studies have already been done to compare the one- and three- 
para^eter '^dels (Hambleton . Traub, 1971, Urry, 1977, but these studies 
«;re limited in the scope of their comparisons, The research done by 
Ha^Uton . Traub (19.71)' con^ared, the. information functions and relative 
■ efficiency, of the one-), and two-, and three-paremeter logisti, mpdesl , 
. ■ for item calibration using simulated test items. Their results showed that . 

■ the tl,r.e-parameter,.»del was ,ore infor^tive than the one-parameter model, 
although the relative efficiency of the one-parameter model to, the three- 
^ parameter model was high until the range of discrimination in" the item 
became^ large, ^ 



■ .aper presented ^^^^'^''^^^^^iZllT.Ts^^^^ 
_ roS"S:oo9rfr:; Z\~TZ Trainmg Research Programs of the 
Er|c ' °^ ^^"^^^ Research. ^ 
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, The' research per£or,.ed by Urr^ (1977) also depended upon, simulated 
t«t data, urry compared the quality of ability estimates obtained from 
the ye, two, and three para^ter logistio models, when the discrimination 
,and guessing parameters of the simulated items were varied. The criterion 
„ed for evaluating the modeli w,as the correlation between tailored testing 
ability estimates obtained from the models and the true, ability u^^d to 
operate the simulations, tts results" showed that «,e one parameter logistic, 
^el.was seriously affectedjby the presence of guessing in the Jimulat.d. 

■ items and was also affected, to a lesser extent, by the variation in 

discrimination parameters. ^ j , ^ 

■ Both of 'these studie's reflect negatively pn the one-parameter 
logistic model, although the practical importance of the deficits present 

■ in the model haCe not been ^de clear. The conclusion drawn on the basis 
cf these \studies. very Obviously would be to recommend the usage of .the 
three parameter model for true-to-lite applications. However, the 
generalizability of the simulation results -to live testing situations 
can be queitioned-. particularly in that the' simulation studies used very 
Idealired item pools and errors induced in the calibration process were . 

. nbt a'fSctor. Therefore' it is the purpose of the research reported here 
to extend i co^arison of these, two ^dels' to real data with reasonable 
s^U si^es and t6 evaluate the' models on Ijoth theoret^ial and practical 
grounds . : ; - ^ ' , ' l 

Models and Programs 
. in evaluating th;si' twoVlatent trait moders for use in "item calibration 
. ability estimation, the -^d^ii themselves cannpt be separated from the 
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computer pro^a^ usea co,.u.e its. ana ability para^ter esti^tes. . 

«,at. is theo.eti.aXly optical Jieia poor results ..cause the ■ . 
■ ™ usea to est*»a.e the par^ters is inaccurate. Seven one-para.eter 
la Six three-para^^er logistic ^aeX calibration procea„res were revie..a . 
^..ore seiectij,, the t^ prooeau^s usea' in this stua.. Oescriptions o. 

t^e thirteen'proceaures ana the .election process are ,iven in KeCase ,197,,. 

nn-p^ paraioet^ ^- logistic model . ^ 

' ^ one-para.eter logistic ,^ael in e^onential for. is ,iven hy the , 

* forn^a: 




X, .(e. - b ) • 

- b.) > Id 

^ere is -rso. ^s'score on Xte. i, e. is the ability parameter, 
tor person J. ana h, ^s the aifficulty para»ter for Xte. i. All it.As 
•e as^ta to^U equally aiscri.inatin, by this ^el ana guessing is 
^a to have no effect on the ite„ score. The ^ael also assu.es a 
^Lensio^al latent trait ana local inaepenaence. The vali^ °^ 
'the abiZy ana aifficulty para'^ters in this .oael range fro. positive 

to negative infinity. • ^ 

>e progra. us^a to estimate ite* ana ability parameters for the 
■Xra^ter logistic .oael is basea on the progra. bitten by «ri*t 
iL Panchipaxesan C196S, . ana was obtainea fro. .erry Durovic of the : 
^e„ vor. CivU-Service Depart^nt.' Although the basic proceaures usea 
;„ the pro^ra. are those aevelopea by Wright . Panchapa.esan. it has 
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extensively .oaifiea t.e eut.or so the r.sponsibiUty for its 
accuracy lies there, ^ 

Three-pararm "-'"- logistic mojel - 

«,e three para>^ter logistic «>ael Is given by the formula 



^ 



^ -Da^Oj - b.) 
.p{x^j » 1} = + (1 - c.) Da, (6^ - bJ 



1 + e 



1 /D 



/' 

(2) 



♦ • 

Where . Is Person J's score, on Zte» 1. 'c, Is the gaesslng parameter 
,or Xte"l. r, IS the constatn 1.-7 useO^to ^ the logistic ogive slMlar 
to the normal ogive, a, Is the alscrlmnatlon parameter for Ite. i, 
• ■ e IS the ability parameter for Person : . ana h, is the alf f Iculty 
pLan^ter for Xte. 1. This ^del ass-es local lndepe.aence ana a 

■' ' ■ ' , ■ . „„^"lt aoes not place any restrictions on the guessing 

uniaimenslonal test, but it aoes not pj.a 

ana alscri^inatlin para^ters as does the .ne-para.eter n»ael. • The 
■ ■ range of theiablllty ana alfflcnlty para..ters of this «=ael Is fro„ 
• positlv^ to negative Infinity, the sa„e as the o„e-para.eter ^ael. 
: • «,e progra. used to obtain the Ite. ana ^lUty para,«ter estl^tes 
. 'for the three-para:.ter logistic ^ael was the 1976 version of the 

XOGlSTprogra»(«ooa,«lngers1.y.I^ra, 197«. This program recognl.es 
■-three score categories, correct, incorrect' ana celt. Mthough .the 

p^gra. is basea on .a.1^ H^ellhooa e.tleatlon principles, substituting 



1 f„ fh^ reciprocal of the nSiser of responses 
a irobablUty of correct equal to the reciproca ,^ 

tor c^ttea Ite-.. caused the resulting/ llKellhooa functlons|to only 
approKl^te the Jual ■ functions-. The ■technique has, therefore, been 
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labeled a'quasi-.axi.uB livelihood procedure. I.>rd (1974) has sho 
that the quasi-maximum likelihood estimates converge to the maximum ^- 
likelihood estimates when the sample is' large and^Qmits are not 
present. When omits are present, smaller variance estimates are 
detained than if the usual maximum likelihood procedures were used. • 

Description of the Problem 
to con^arlng the cne-_ana three-para^ter models for use in item 
calibration and ability estimation, five specific comparisons were ^ 
„«de. These ii.=lude= (a) the evaluation of the goodness of fit of , 
each of the ..odels to the 'item resp<mse data, (b) a^e determination of 
the relationship between the ability.estimates an/the item respc^ses, 
(c) the determnation of the^predictive valldiy of the ability estinetes 
froi.- the models in some limited cases, (d) ^e estimation of the m^^ _ 
sa-ple Size required for each .^del to calibrate tests, and (e) the 
determination of the relationship^een the ability estimates obtained 
from the two mod,nsr<rj:<?/se=tion of this paper will describe ^ 
e^ch of these comparisons ^Si detail. ^ - 

Method 

Goodness of Fit/ X ^ - \ * - ' 

The initial evaluation of .the twp models dealt with- the 'question 
of which model fif the item response iata better. Several goodness of 
m tests have been used for this previously, but it was felt that problems 
existed, in the approximaaons used and assmptions made by 'these methods. 
*erefore, a new statistlc Vas developed for t,)e purp-o.,s of this co^.arison. 
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This statistic is given by the following formula: • . 
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N ■ J 
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where MSD^ stands for the mean squared deviation fjx^Test t, x^. is the 
response to Item i by Person j, P.O.) is the probability of a correct 
response to Item i for Person j determined for the model 6f interest,, • 
„ is the number of items, and N is the numbW of people/ This st.t^c 
'ranges from 0 to 1, with a low value being desirable. _ If every ite^ a 
^ test had ^ero discrimination, the MSD^ statistic for the test would be; ■ 
' .25. Negatively discriTuinating items give a MSD value larger than .25. 
. " • The test MSD statisti-c, was computeS for each of the tests used in 
this study for both the one-paramete^ and three -parameter logistic . 
«.dels. ^e iter, parameters obtained, from the- calibration of the tests 
with 'the models were used to complete the probability of a correct 
response. Since the MSD statistics for the :tests u^ed for these 
^ analyses were approximately normally distributed, a two-way analysis 
of variance^was, performed on these test MSD values using the ,item «SDr 
statistics as observations, -^e item MSD value is thi ' t^^.within the 
brackets in Equatio; The^ tw6. dimensions used in this analysis were 
nodels and tests.^ Post hoc comparisons were used to find specific " 
differences in tests. 



A. 

\ 



gelation of ability estimai -^^' to item response ' * 

A. second analysis 'that evaluated.the felation'ship between the^two 
latent trait models and the item responses was the confutation .of the 



^lUple .crreUt-io„..e.«ee„ the a^im/ estimates obtained fro. a test- 
„d the sets Of ite. responses' fro„ the sa^ test. This was done to 
detennlne 'the variance in co-on between O-e responses and the ability 
estimates, «.e, .nultiple c-orrelati^ was oo^uted for each test used in the 
study and the n^gnitude Of -the values was co„p4red,usin, the correlated t 

statistic' to determine if t«ere was .a significant difference in the variance 
'in the ltem4esponse accounted for by the ..odels. 

r^ncrreVt vali ^'-f -''^"tv estimates 

• ■ rtfr several limited cases, other test scores were available for the 
individuals talcihg the tests, to be calil,rated. Although the sables 

fo, these tests- were relatively small- the opportunity to relate the 
■ability estimates from the latent trait models to the other te^ts 
. could hot be passed by. Three separate- samples W, Wore, u^ed . 

in correlating the ^llity estimates from- the two models with these' . 

oth.r tests. .The re^ultin, correlations were compared statistically to 

determln; which model yielded the larger validity coefficient... 

• Sample size requirements - . • * 

■ „ l^ortant question that .has only been toud,ed upon in the research 
Utarature ' (Cypress, 1972, ^s the sample size required for iccurate . 
estimation of the para^ters of the two models. To more thoroughly 
■ explore th; sa:^le size limitation, of the modelsf seven .amples of various 

■3l.es were drawn from the> students talcing a standardized- test, '..arameter 
' estimates were obtained for each of these samples and the Results we.e 
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compared to' the calibration results sised on 2,997 cases using- a squared 
ievlatica statistic. That Is. for each of the Ite. para^ters derived 
^slng the t^o .odels. the seller sample estimates vere .subtracted fro. 
the large sa^le values, the .difference squared, and the results s-ed. 
^The average squared differences tor the parameters were compare'd across 
.sample sizes- usln^ analysis of' variance techniques to determine the. 
^IBU. sample sizes that yield adequate parameter estimates. 



k 



Ability parameter comparisons 

■ i„ order to determine whethet the ability parameter estates derived 
from the two models were measuring- the sime component.' the ability 
estimates were correlated with each other, with' the raw scores from the 

f-v^o f-trc;t factor on the tests. These 
tests, and with factor scores on the first 

correlations w'ere d.ter.ined £or each of the sixteen tests used in' this 
study. The factor scores wer. generated fro. factor analyses .using both 
' phi and tetrachorlc correlations- 



Data Sources 



Live testing data-sets 

The sixteen da,a-sets used In this study are described In Table I 

along with the abbreviations used' for each and the' sample size used for 
.oallbratlon. The -fl.st eight of the datalsets listed were obtained from 

rte «mlnistra;ion of two types of tests to groups of students.. One test 
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«ed„as the Missouri school and college .biUty Test -(MSCAI). Data 
tro. administration of this test ,throu,.«ut the state o£ Missouri v,as 
availabl, for the 1975 and 1976 school years. B,e test is con^rised of 
two subtests which were calibrated separateay. 



Insert. Table -1 about here 



The other type of live testing data available for caiiiratlon _ 
was Obtained frc the adBini^traUon of four classroo. examinations on 
use Of standardized tests. ' The data was collected usin, a large under- • 
graduate. „e,sure»nt course during the period from' October 1975 to Hay 1977. 
' Bo.f the standardized and classroo. tests -were fifty ite., «ltiple-bhoice 



tests. 



.Mong with these data-sets, seven other sa-ples were obtained- fron, _ 
^TV6 to determine sa^le size, effects. Systematic sampling was used, 

^ o noT 0 107 1 525 1.090, 763, 382 , -^nd .150. 
yielding sanvples of 2,997, 2,197, 1,S^5, , 

, • * » 

Simulation data sets 

in order to .gain greater control- over the characteristics of 'the 
data, eight simulated te.t data-sets were produced. These were generated 
to match various factor loading matrices using.the usual linear factor 
„,ly3is model. The simulation procedure 'generated z-scores for each 
person on each item using a weighted sum of normal randon, numbers 'and 
.then dichotomized them to yield the proportion of borrect and incorrect 
responses' specified by 'the traditfonal difficulty indices. Guessing did 
«,t enter into the production of the simulated data-sets. K sample of 
1,000 eases was g^erated for each 6t ae eight simulated tests. ^ 
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rour levels of factorial complexity were used i^ generating these 
aata-sets: one-factor, two-factor, five-factor, and nine-factor. The _ ^ 
size .f the- factor loadings and distribution of difficulties were also 
varied for the si:nulated tests. Ndrmal, rectangular, and constant • / ^ 

distributions of difficulties were used, although no attempt was made to 
include all possible co:nbination.. ^e^ distribution of difficulties 
"referred to here is based on, the proportion" correct. index. 

\ 

Results 

Goodness of fit . , * ' - 

«,e test MSD statistic for the si:itee„ data-sets for eaoh of the ^dels 
,re presented in tahle 2 along »ith the analysis of' variance results. The . , 
■'analysis of variance perfor^d ™ this 'data »as a t„o-way analysis with 

repeated ..easuras on one di^nsion. ' The independent variables vera test . 
and type of logistic model _ ^ ' 



Insert Table 2 about here 



■ The results of' the analysis of variance show that the thtee-paran.eter^ 
Boael fits the data significantly hatter than the ona-par^ater ^del, 
• ^ .although the differenc^ in; the overall ..»ans is only .004. Bo.»^r. for 
. every data-sat 'the average deviation ^ro. fit was seller for the thrae- 
^ara^ter ^1 than for the cna-para^tar model. The HSD values were 
also found to be significantly different across testi. The one-factor 
data-set (150A.) was fit best.by the »^els. as wouid be expected, and 
■ the nine-factor data-set (950M<3, had the worst fit. also as expected. .No, 
significant interaction was found in the data. 

er|c • V • II • 
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TO further 'rarvk the-lests In t.r«s o£ fit of the models, the Ne™an- 
Kells post- hoc comparison prp^dure »as vsed to determine li^ere vere 
.iSniflcent dlHerences.ln' the fit of specific tests. Ihe-r«^Us of this 
analysis are presented at the bottom of Table 2. ' As can be seen f^.m the 

. u -t,. isnAR data-sefc is'fit by the models significantly 
results presented there, the 150AR data sec^is / ^ 

better than any' of the other tests. This Is the one' simulated test that^ 
mefets all 6f the assumptions of both models.- It contains only one factor, 
all.of the Items are equally dlscrlmlAtlng, and no guessing Is present: . 

Ihe .250AR daM-s'et has the ne.t/best f^- for the models. It has two ' 
'factors', a »lde range of Item dlf f icuVles , ^nd. no guessing.' Although the 
fit for 'this test is significantly worse than for^l50AK. It Is significantly 
better than all bui one of the other tests. Ihejajorlry of the other 
. data-sets are fit about equally well by the t^o models.. . , 
. • ■ At' the poor fitting end of the continuum: are three se||f si-latlon 

■ data: ~550AH7, 950M)<(, and 950^: All of 'these slmulated^ts have a ,. 

■ relatlvely.largenimb.er of Wndent factors. Data-set 95DAN3 1? the - 

■ worst fitting pf the tests, having . KD statlsflc__very 'close to the valu, 
of .25 expected when all Items We' zero discrimination. Ihls simulated 
test has low loadings C.3) oh the nlne.lndependent factors. 

• The trend of this analysis suggests that, the multldlm^nslonallty of ^ 
the tests \s a definite fact)^ In-the fit of. the two models.; .The three> . 

• parameter logistic model handles ,hls. deviation from the assumptions , ■ ' 
significantly better than the one-paramefer model, but the ordering of 

■ 'the effect Is^he same as Is sho>^ by the lacWof a significant Interaction. 
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'' vi;. rplati'onship between ability estimates and 

In order to .determine the relatxonsnxp 

■ „o.eX an. t.e. n., Ue. .espouses W.S co^u.ea. «e3e values . 

.a in for «.e ability esti^tas corralataa U,a .t,.s 

are presented in lacxe j j-^ . . 

t. ^ n of correlations with the 
, uo. thesixtean aata-sats. Note that all o£ the 

o„e-para.eter^iUtye,ti.ate'saraa.tre^lyhi,h.asthey»ust.be , 

Woa.a o. the s.nclenl statistic .o,a.tll. o. the .ael. ..tlpla 
-.^.elation, a.e hl,h .o. the W-pa.a.ete. .lUt, astl^tes -en ^ 
a.ln.t .acto. IS P^sant. ..t 'a.op „hen maepenaent, e^Uy »a.htaa 
factors are present. . ; 



Insert Table 3 about here 



^ '.. • ' . nprformed on the meahvmulti/le- correlations for. 

• A related t-test was performea on .... 
;„o ahlUty es't^atas to aete^ine U the ohsa.,a a..a.ences »e.e 
: .i^Ulcant. ^e ai«e.enc, in ^ ^an «ltiple-oo..elations o. .0, 
- ■ H ^h." 005 level, inaicating that tha thraa-parameter 

significant at beyona the .005 level, 
ability estimate correlations" are significantly lover. 

• r.^:,....... validity oLaUliSLestiUeS. ' , ' 

—Z::;Z::^ZZZ^^^ -iUty asti.ates ^or the-.tvo .oaals^ 
^.Water-naa h, ,oor.alating the esti^tes ohtained fro. the finare^a.^ 
three a^feran^ semesters of' an .nn.argraa.ate .eas.e.nt course »ith 
^ ,irst ana second e.a. in the sa„e .e^ster. >e .correlations between 

<.r^ ^hP^ criterion measures are 
the ability estimates and the ra» scores^the crite 
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presented in Table .4. ' In all but orie case, the one -parar^ter ability • ^ . 
estiH^tes have higher correlations with the, criteria than the three-^ . 

■In nn case were the differences in correlations* 
'p^aineter estimates. However, in no case were 

A ic .ionificant One reason for the slightly lower correlations 
for the two models signiticanx:. 

the thrae-para«ter »=ael could be the sn^U s^Bple .Ize used in this 
_;,.is.«hich will he sho^ later to affeot the three-para„eter ^el ™.re 
than -the one-paranBtei «=del, causing unstable estlmites. • . - 

' . ^ 

\ 

Insert Table 4 about here 



Sample si ze recmiremefl^s 

. ,^e average s^arid deviations for the lte„ para^ter esti-tes fro. 
the seven subsa^ples as «ell as the s<^area deviations obtained fro. a, 
second 2.99, sample arelresented in Table S along with the ^ results, 
^ed to determine if any significant. diff;rehce, existed. .One-way repeated 
.asure, analyses of var ance were performed using the scared difference 
values for the fifty Iteis as the- dependent Measures »ith sa,le size as 
the independent va^able^l ' ■ • ' • - 




Insert Table 5 about "h^ re 



The means of' three ofl.the tour sets of item parameters give asimilar 
pattern of 'results. 'The 2I997 sa^le has the smallest nean squared ■ 
■deviation, while the deviaiions tend to get; larger with decreasing sample, . 
size. This relationship fs strong for the one-parameter easiness parameter 
the three-parameter d'iscrimination parameter, while the three-parameter 
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• ■ ^mcultv and guessing para^ters shov, considerable variation. «he , 

■ ^.lysls'of variance results show significant differences in all cases . ^ . 
■ . except for the three-par^ter difficulty parameter. ' In that case, 

■ • ^though there are, large differences in the :«ans, the large variation in ■ 
esti^tes resulted in a failure to reject. . P-.ax test for heterogeneity 

, ■ , a value of 2,527 easily rejecting the hypotAesis of 

' of variance yielded a vai-ue oj- . 

^ ■ " . after a logarithmic -transformation 

hbcnogeniety. A subsequent analyses on values after a 1 g 

yielded^ a ifeignif icant F. " • ^ 

■ . purpose of this set of analyses was to deter^ne at what point ^ ^ ^ ^ 

aecrease in.sa^le si.e would (adversely a^fecfthe results of ite» ■ . 

'. calibration, ^is ,ue;tion wis addressed directly in. a' post hoc analysis 
. perfora^d using the MOV. results. Using the :»ean scared deviation values 
' ■ ,or .ach sa,le size, the —Keuls post hoc procedure was used to 

. aeter»inethe■largest,sa^^ that was significantly different fro„ the 

■ , the 2 997 sanple. The results of thSe analyses 
liie«> squared deviation from the 2,99/ sain. ^ 

also presented in Table S. sables- that are notlsignificantly different 
. „e underlined. Those that are different, do not share the W underline. _ 
nue to the great variation in the 3P. difficulty values, the results 
Of this study were not easily' interpreted, indicating the need for further 
. ■ research. However, so,, general conclusions can be-drawn fro„ ,he data. 

*e IP. easiness paxa^ters see. to have, stabilised when the sai^le ' , , 

.l.e is greater than 382. A sample somewhere between 382 and 7« is 
- • p„bably the lower limit required when using this model. , 3P. data are 
■ ' . harder to interpret. The 3P. discrimination parameters sejm to be . 
N ^erately stable above the ISO sample, but the ^an square deviations 

tor the 3PI. difficulty values are far from stable, with values for the - ■ 
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sanple of a^ut the san. si^e.as s^a.ea deviations for the IPL 
■ easiness para^ter for the 33.^ sa^ie. «thou,n these values are not on 
p„cisely the sa™ scale, the valu« should be so.ev,hat co^arahle. This 
„s„lt suggests that the 3.. al«ionlty para^ters are lust starting to 
.„hlU«. The heterogeneity o. var.ano, in the analvsis .of the ai«ioultv 
p^ai^ters reduces its usefulness. ho„eve.. the ISO sa^le is clearly .orse 
^an" the rest. Overall the results suggest that suhsiantially larger 
s^e. ar. reared for the 3.£ „odel. The guessing pa.a.eter does not 
enter into this discussion because of the numerous restrictions pUced upon ■ 
it In the calibration program. 

• Ability parameter com parisons 

■ ^e correirtions -between the ability parameter esti:nates for the t« , 
^els with the raw scores and selected f.otor scores for, the tests are, •, 
,iven. in Table S.' Xn seven of the eight liv, testing data-sets, the , , 

■ correlations bet.»en the ability estimates fro. the two models are .90 , 
above. T..ere is „.ch greater variation in the .i^ulation data, probably ; 
d^ to the «lti-factor nature o^ the tests. HoweveV. even there the 

• correlations are high when a So-inant fitst factor is present. 

\ — ^ .. ' . 

Insert Table 6 about here 

correlations with the ' raw-' scores on the tests and the first 

■ •, fir the live testing data,, although the . 

- factor scores are uniformly high for the live 

,-. eiiohtlv higher correlations than the 

-one-parameter model generally has slightly higner _ 

' , . ^ there is ^ieater variation for the simulation data, 

• three-parameter-model. Again, -there , is ^re at 
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• • • • • . ' . • ■ V 

^ ^ three-parameter model when no dominant 

with lower correlations for the three param , ... 

' .aco. U =ne.p«^.e. ^aeX .>.s .o„e... .... ; 

the ability parameter. . ^ , 

e>,n« that in most cases the two models are 
In general, the results show that ^^^n mo . . , , 

' • ■ .K- J the first factor of ^the test." Wn a dominant 

measuring the same thing, the first ta ^ , . / 

. there are major differences in the correlations, 
''first factor is not preset, there are ma^o / 

\.u rlifferences in, much greater detail than/ 
■Iteckase (1977) discusses these differences 

can be done here. 

niscu-ssion ar ^fl conclusioTis_ 
.i.e comparisons were mad'e in ^e stud, reported here relative/to the ^ 
, ^ut. Of ..e e'stimates of parameters obtained from the one- and ^^hree.- • 

^ ' . , 1. The results can be suxtunarized briefly/ as follows: 

parameter logistic models. The resuxx: • / , 

^•1 4=^t"the test data better in all cafees than 
(a) the three-parameter modffl fit the test a , 

d.l Vthere was a trend in the fit relate^ to the 
the one-parameter 'model ahtT there w I . \ 

^ . (v,^ the one-parameter model ability estimates , 
'dimensionality of test; (b) the one pa 

- ^ *->,an the three -parameter model, 
. .ha«d ^re variance with the item response, than the th 

ther^ .a. no in the — t vaUait, .or s^U sanies , 

..in, the^;«o .oaexT preaictin, oUssroo. a,hieve».t tests, Ca, the ,ne- 
„e-r ..ex re^irea s^iXer s^pXes .or caX.ration than the thre^ 
•^^ter „oaeX, ana ,e, th, a.iXit, estimates .r. the- 1. »aeXs, corieXatea 

highly for most of the data-sets. ^ ' • . 

' rro. these results, certain collusions can be drawn concerning the 

^ ' I _.,is with fifty item group exams when sample sizefe of 
use of these two models witn rm-y I , ^ 

• ' . -n.hle '■ First, from the ability estimate 

/ . approximately two hundred are available. First, 
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«^„lso„s,-it saen^a th,t the two .odels estimates th.e sa^ latent trait 
«hen there «a. a dopant, first factor, even »hen it accounted for a SBall 
^^t Of the varlancerThe concurrent validity data also supported this 
point Of view, since the magnitude of the correlations were essentially the 

Sinc» the sa^le size retired to obtain stable parameters was , 
^ler for the one-parameter model and the overall representation of the 
aata was better as reflected by the multiple correlations, the one- 
parao^ter model is preferred for use with small sa^^le group data to predict 

outside criterion variable&o \ 

T^e goodness of fit data reflected a different ioint of view, however-. 
0^' three-par^ter model fit all the data-se^s better than the one-,. ' ^ 
parameter model. result may be l^ortant when accurate estimation .of 

^ ite*1ara..ters is i^o.tant such as in the area of tailored testing. 
» tailored testing con^arison of the two :»dels done by Ko.h . HecKase ,(1978, ^ 
sup^rts this point of view, showing the three-para^ter procedure to 
yield superior results to the onelparameter procedure for a taUored . 

testing application. , ^ 

Although this research does give valuable information that will be 
helpful in selecting between these two latent trait models, mu* further 
research li required. Specifically, validity studies based on larger 
sables and other criterion variables are needed to allow generalization of the 
findings. Also the sa^le size determination, need to be more precise than . 
those reported here. 



; Table 1 
Description pf Data-'Sets*; 



Test NWe 



Abbreviation 



Sample 
Size 



Description 



vl. Miss^ouri School and 
. College., Ability Tests 
• Verbal/1975 



2# Missouri School and 
College Ability Tests 
Quantitative/1975 

• i. 



3. Missouri School and 
College Ability Tests 
Verbal/1976 



4. Missouri School and 
'College Ability Tests 
Quantitative/1976 " 



5* Exam on Standardized 
Testing . 

6. Exam on Standardized 
Testing', 

* ?• Exam on Standardized 
Testing 



MSCATV5 



MSCATQ5 



MSCATV6 



MSCATQ6 



STi075 



ST0576 



ST1076^ 



3,08^ Systematic sample from 
. 57,800 cases from 
Missouri Statewide 
Testing Program 1974- 

1975. SCAT Series II 
Form 2B, 

3,087 Systematic saiuple from 
. 57,800 cases from j 
Missouri Statewide 
Testing Program 1974- 
1975 • SCAT Series II 
Form 2B. 

3,126 Systematic sample from 
65^*600 cases from 
Missouri Statewide 
Testing Program 1975- 

1976, SCAT Series II 
» Form 2B, 

3,126 Systematic sample from * 
65,600 cases j^rom 
Missouri Statewide 
Tes(ting Program 1975- 
V 19757'" SCAT Series 11 
Form 2B. 

208 Undergraduate course 

final exam administered 
in October 1975. 

181 . Undergraduate course 

final exam administered 
in May 1976. . 

176 Undergraduate course 

« final exam administered 
In October 1976. 



*A11 tests are 50 itcsis ih length • 



Tabje 1 ^(Continued) 
* Description of Data-Sets 



Test Name 



V Abbreviation 



Sample^ 
Size 



Description 




8. Exam -on Standardized 
Testing 



ST3-577 



312 



9. One factor rectangular 
simulation data. 



10, Tv70 factor normal 
simulation data. 



;i50AR . • 1^000 



250AN 1^000 



11. ^Two factor rectangular 
simulation data. 



250AR 



1,000 



12. Two factor .5 
simulation data. 



13,. Sine factor Spearman 
simulation data. 



250A5 




950ANS 



1,000. 



14. Nine factor independent 
.9 loading simulation ^ 
data. 



950AN9 



1,000 



Undergi?aduate course 
filial exani^administered 
to two sections of the 
course in' March and May 
1976. 

One factor with loadings 
of .9, rectangular 
distribution of diffi- 
. culties. 

Lgadings of .9 and .0 
-.randomly distributed on 
:?^o factors, normal 
distribution of diffi- 
culties. 

« 

. Loadings of .9 and .0 
randomly distributed* on 
two factors, rectangulaf 
distribution of diffi- 
culties. 

Loadings of .9 and .0 
randomly distributed on 
two factors. All items 
.5 difficulty 

One factor .7 loadings 
for all.^itesis. Eight- 
factors*. 6 loadings 
ran^mly distribut|d 
.over itetns. Nomal'-dis- 
ribution of difficulties 

Items randomly distri- 
buted to nine factors 
with .9 loadings. Nonaal 
distribution of 
difficulties. 
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Table 1 (Continued) ^ 
Description of Data-Sets 




Test Name 



15. Nine factor independent 

.3 loading simulation 
• data. . 



16. Five" factor independent 
,7 loading simulation 
data. 



Abbreviation 



Description.' 



950AN3 . 1»000 



550AN7 ., . 1.000 



J 



Items, randomly distri^ 
buted tb nine factors 
with .3 loadings • Normal 
distribution of - . 
difficulties. • 

Items randomly distri- 
buted to five factors ^ 
with .7 loadings • Normal 
distribution of 
difficulties. 



\ 



21 . 



Test 



,1. MSCATV5 

2. fI§CAXQ5 

3. MSCATV6 

4. MSCATQ6 

5. ST1075 
• 6. ST0576 

7. ST1076 

8. ST3-577 

9. '150;^! 
10. 

U. 250AR 

12. 250A5 

13. 950ANS 
lA. 950AX9 

15. 950AN3 

16. 550Ai77 



Source 



Talkie 2. 

Squared Deviations from the Two* Models 
for the Sixteen Data-Sets . 



One Parameter 
Logistic" 



Three fararaeter 
Logistic 



Jst Means 




.169 
.164 
.169 
.166 
.144 
.167 
.159 
.184 
.068 
.162 
.122 
.185 
.156 
.211 
.223 
.210 



'.166 , 
.160 . 
.166 
-.161 

a3g 

ri65 
.154 • 
.182 
.067 
.153 
.115 
.176 
.156 
.204 
.222 
.206 



7 



i 



•../ 

1. 



.167 
.162 
.167 
" .163 
.141 
.166 
-.156 
.183 
.068 
.158 
,ai8 . 
.180 
.156. 
-;208 
.222 
.208 




Sum of Squares d.f.-Mean Square 



Significance 



Tests 

Items within 

tests 
Models 

Tests' X Models 
Models X Items 
within tests 



1.995 

3.301 
.007 
.003 

.355 



Poor FIT • 



15 

784 
1 
15 

784 



.133 

.004 
,.007 
.0002' 

.0005 



.31.667 



14.684 

I .414 

1 
I 



.001 
.001 



Post Hoc Comparisons Using Nexraan-Keuls Test 



Test 



.Good FIT 



15. 14. 16. 8. 12. 1. 3. 6. 4. 2. 10. 7. 13. 5. 11. 9. 



f- 



fiiU^^^^^^^^^^ Significantly 



» * * » 

^ .. . 




• 


» 




Table 3 








Multiple Correlat^ions Among 
Ability Estimate^ and Test Items 

? 


■ 4 




• 


Ability Estimate 




^est 


IPL 


3PJ. 


1PL-3PL 


' .NSCATV5 
* " MSCATQ5 . 
'MSCAtve *. 
•^MSCATQ6 ^ 
ST1075 . 
. ST0576 
' ST1076 

ST3-577 "^'t 
150AR • 
250AN 
25DAR 
^250A5 
■ . r^SOANS 
950A.^9 
950AN3 
5S0AN7 


• .991 . 

' .998 
.993 
.9^1 
.994 

• 985 

.990- 

• * .981 • 
.991 
.978 
.983 
.998 
.9998 
.998 


.983 
.985 
.988 
.983 
.944 

• 967 

• ^ . .997 ' 
i677 
.948 , 
.839 
.949 
.852 
.890 
.866 


.008 

■ .003 
.005 

■ .008' 
.050 
.041 * 

' .018 
. .011 
-.007 

.304 

.043 

.i39 ' . . 

.034 

.146 

.1098 

.132 


Mean - , , ° 


.9906 


.9253 


.07149 




t « 3.705 

in 


p < .005 . 


- • 




Table 


4 






Correlations between Ability Estimates 
and Two Classroom Tests 




, " Data 


Ability 

N ' Estimate 


Test 

Exam 1 


^ Exam 2 


^ ■■ ^^^i 

' SriQ76 


176 ' IPL 

3PL 


.555 . 
. ..492 


I? 

\ »661 
.599 


.ST057e 


181 , IPL 

3PL 


• .409 

.364 ^ 


.47-/ 


■ ST1075 


208 " IPL 

3PL 


.558 
.498 


.576 - ^ 
. .5^5 


't 
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Table 6 



Sample 
Size 



150 
382 
763 
1090 
1525 
2197 
2997 



Comparison of Parameter Squared 
Deviations for the Two Models by Sample Size 




IPL 
Easiness 



.0483 
.0196 
.0063 
.0063 
.0055 
.0047 
-.0041 



3PL 
Difficulty 



.1811 ( 
.1413( 
.0272( 
.1930( 
.0299( 
.0138( 
.0166(, 



.1326)' 

.0847) 

.0258) 

.0821) 

.0263) 

.0135) 

.0162) 



3PL 

^Discrimination 



.2187 
^ .0973 
%|.'0615 

.0585" 
. .0589 
.0335 
.0241 



3PL 

Guessing 



.0014 
.0009 
.0020 
.0009 
.0012 
.0011 
.0008 



Transformed means using log(x+l). 



Source 

Samples 
Error 




294 



ANOVA ' IPL Easiness 



SS 

.0791 
.2133 



MS 

.0132 
.0007 



F 

18.17 



P 

<.0001 



Source 

Samples 
Error 



d.f. 

6 

294 



ANOVA 3PL Difficulty 



/ SS 

2.009 
65.643 



MS 

.335 
.225 



F 
1.50 



N.S. 



Sour 



ce 



Samples 
Error 



d.f. 

6 

294 



ANOVA 3PL Discriininatidn 



SS 

1.303 
8.743 



MS 

.217 
0.030 



F 

7.30 



Jf 

<,0001 



Source 
« 

. Samples 
Error 



ERIC 



d.f. 

6 

294 



ANOVA 



3PL Guessing 



SS 

0.000055 
0r000787 



. V, MS 

.000009 
.000003 




F 

3.44 



P 

<.003 



Source 

Samples 
Error 



ANOVA 
r— 



d.f, N 
6 

294 



Table 5 (cent,) 
Transformed 3PL Difficulty . 
SS MS - F 



0.627 
8.506 



.104 
0.029 



3.61 



P 

<.002 



Post Hoc Comparisons 



2997 



2197 



IPL Easiness 
1525 1190 




763 



382 



150 



mi 



2197 



3PL Difficulty- 
763 1525 1190 



382 



150 



I- 



2997 



2197 



3PL Discrimination- 
1190 1525 763 



382 



150 



2-97 



1190 



382 



3PL Guessing 
2197 



1525 



150 



763 
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'. Table- 6 



■ ^ Correla-tion between •Ability Estimates, 
Saw Scores, and Factors for thev.Sixte§n Data^efs 



Variable 





• 




Raw 


• 


Phi 


Tet 


Data-set 




Ability 


3PL 


Principal 


Princip, 





Estimate 


Score 


Ability 


Component, 


Componei 


MSCATV5 




3pl 


97 




98 


• 9.8 






IPL 


99 


•96 


9*5' 


97 


MSCATQ5 




3PL' 


97 




98 


98 






IPL 


?9 


97'* 


97 


97 


MSCATV6 




3PL 


98 




99 


99' 






IPL 


99 


97 • 


98 


.98 


MSCATQ6 




3PL 


97 




98 


98 






IPL 


99 


96 


97 


^7 


ST1075 - 




••3PL / 


83 




89 


32 






IPL 


99 


85 


89 


. 29 


ST0576 


• 


3PL, 


88 




91 


87 






IPL 


99 


- ^0 


93 


,88 


ST4p76 . 




3PL 


^89 


■ 


94. 


91 






IPL 


98 


90 




86 


ST3577 




- 3PL 


. 95 




\ 98 


98 


> 




, IPL . 


99 


• 95 


97 


97 


150AR 




3PL- 


97 




97 


.98 






IPL 


95 


99 


" 95 


97 


250AN 




3PL 


59 




59 


56 ' 






IPL 


98 


66 


98 ' 


97 


/30AR 




3PL 


71 




69 


92- 






IPL 


. 99 


73 


99 


• 74 


250A5 




3PL 


82 




56 


62 






IPL 


98 ' 


83 


76- • 


. 83 


950ANS 




3PL ^ 


93 




93 


94 






aPL 


' 98 


96 


^•98 


98 


950AN9 


• 


3PL 


62 




82 


67 




r 


IPL 


99 


.^ 62 


72 


72 


' 950AN3 


3PL 


71 




36 


41 




IPL 


100 


71 


25 " 


33 


550AN7 - 




3PL 


70 




46 ; 


36 






IPL 


100 


70 


32 


27 



Note: All values presented without decimal points. 
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